25 research outputs found
Efficient parallel computation on workstation clusters
We present novel hard- and software that efficiently implements
communication primitives for parallel execution on Workstation
clusters. We provide low communication latencies, minimal protocol,
zero operating system overhead, and high throughput. With this
technology, it is possible to build effective parallel systems
using off-the-shelf workstations. Our goal is to develop a standard
interfaceboard and the necessary software for interfacing any
number of computers, from a workstation to a cabinet full of
workstation-boards
Latency hiding in parallel systems: a quantitative approach
In many parallel applications, network latency causes a dramatic
loss in processor utilization. This paper examines software
pipelining as a technique for network latency hiding. It
quantifies the potential improvements with
detailed,instruction-level simulations.
The benchmarks used are the Livermore Loop kernels and BLAS Level
1.
These were parallelized and run on the instruction-level RISC
simulator DLX, extended with both a blocking and a pipelined
network. Our results show that prefetch in a pipelined network
improves performance by a factor of 2 to 9, provided the network
has sufficient bandwidth to accept at least 10 requests per
processor
PSPVM: implementing PVM on a high-speed interconnect for workstation clusters
PSPVM in an implementation of the PVM package on top of ParaStations
high-speed interconnent for workstation clusters. The ParaStation
system uses user level communication for message exchange and
removes the operating system from the critical path of message
transmission. ParaStations user interface consists of a user-level
socket emulation. Thus, we need only minor changes to the standard
PVM package to get it running on the ParaStation system.
Throughput of the PSPVM is increased eight times and latency is
reduced by a factor of four compared to regular PVM. The remaining
latency is mainly () caused by the PVM package itself. The
underlying sockets are so fast (25s) that the PVM package is
the limiting factor. PSPVM offers nearly the raw performance of
the network to the user and is object-code compatible to regular PVM. As
a consequence, we achieve an application speed-up of four to six over
traditional PVM using regular ethernet on a cluster of workstations
The ParaPC/ParaStation project: efficient parallel computing by clustering workstations
ParaStation is a communications fabric for connecting off-the-shelf
workstations into a supercomputer. The fabric employs technology
used in massively parallel machines and scales up to 4096 nodes.
The message passing software preserves the low latency of the fabric
by taking the operating system out of the communication path, while
still providing full protection.
The first implementation of ParaStation using Digital\u27s
AlphaGeneration workstations achieves end-to-end (process-to-process)
latencies as low as 2.5 us and a sustained bandwidth of more than
10 MByte/s per channel with small packets. Benchmarks using PVM on
ParaStation demonstrate real application performance of 1 GFLOP on
an 8-node cluster
Prefetching on the Cray-T3E: a model and its evaluation
In many parallel applications,
network latency causes a dramatic loss in processor utilization. This
paper examines software controlled access pipelining (SCAP) as a
technique for hiding network latency. An analytic model of SCAP briefly
describes basic operation techniques and performance improvements. Results
are quantified with benchmarks on the Cray-T3E.
The benchmarks used are Jacobi-iteration, parts of the Livermore Loop
kernels, and others representing six different parallel algorithm classes.
These were parallelized and optimized by hand to show the performance
tradeoff of severals pipelining techniques.
Our results show that SCAP on the Cray-T3E improves performance compared
to a blocking execution by a factor of 2.1 to 38. It also got a
performance speed-up against HPF of at least 12% to a factor of 3.1
dependent on the algorithm class
PSPVM: Implementing PVM on a high-speed Interconnect for Workstation Clusters
. PSPVM in an implementation of the PVM package on top of ParaStations high-speed interconnent for workstation clusters. The ParaStation system uses user level communication for message exchange and removes the operating system from the critical path of message transmission. ParaStations user interface consists of a user-level socket emulation. Thus, we need only minor changes to the standard PVM package to get it running on the ParaStation system. Throughput of the PSPVM is increased eight times and latency is reduced by a factor of four compared to regular PVM. The remaining latency is mainly (88%) caused by the PVM package itself. The underlying sockets are so fast (25¯s) that the PVM package is the limiting factor. PSPVM offers nearly the raw performance of the network to the user and is object-code compatible to regular PVM. As a consequence, we achieve an application speed-up of four to six over traditional PVM using regular ethernet on a cluster of workstations. 1 Introduction ..
The ParaStation Project: Using Workstations as Building Blocks for Parallel Computing
The ParaStation communication fabric provides a high-speed communication network with user-level access to enable efficient parallel computing on workstation clusters. The architecture, implemented on off-the-shelf workstations coupled by the ParaStation communication hardware, removes the kernel and common network protocols from the communication path while still providing full protection in a multiuser, multiprogramming environment. The programming interface presented by ParaStation consists of a UNIX socket emulation and widely used parallel programming environments like PVM, P4, and MPI. This allows porting a wide range of client/server and parallel applications to the ParaStation architecture. The first implementation of ParaStation using Digital 's AlphaGeneration workstations achieves a communication latency as low as 2:5¯s (process-to-process) and a sustained bandwidth of more than 10 Mbyte/s per process. Benchmarks using PVM on ParaStation demonstrate real application performa..
Using Workstations as Building Blocks for Parallel Computing
The key to efficient parallel computing on workstations clusters is a communication subsystem that removes the operating system from the communication path and eliminates all unnecessary protocol overhead. At the same time, protection and a stable multi-user, multiprogrammed environment cannot be sacrificed. We have developed a communication subsystem, called ParaStation2, which fulfills these requirements. Its one-way latency is 14:5¯s to 18¯s (depending on the hardware platform) and throughput is 65 to 90 MByte/s, which compares well with other approaches. We were able to achieve an application performance of 5.3 GFLOP running a matrix multiplication on 8 DEC Alpha machines (21164A, 500 MHz). ParaStation2 offers standard programming interfaces, including PVM, MPI, Unix sockets, Java sockets, and Java RMI. These interfaces allow parallel applications to be ported to ParaStation2 with minimal effort. The system is implemented on a variety of platforms, including DEC Alpha workstations ..